2.8K Downloads
a 7B Vision Language Model (VLM) from the Qwen2.5 family
Trained for Ttool use
Last Updated 16 days ago
Qwen2.5-VL-7B-Instruct is a vision-language model that processes images, text, and video, supporting structured outputs and visual localization. It can analyze charts, graphics, and layouts, and is capable of temporal reasoning over long video sequences.
The model is intended for use in document analysis, event detection, and extracting structured data from visual content. Outputs include bounding boxes, points, and structured JSON data.
The underlying model files this model uses
Based on
When you download this model, LM Studio picks the source that will best suit your machine (you can override this)
Custom configuration options included with this model